Welcome back to deep learning. Today we want to continue talking about convolutional neural
networks. What we really want to see in this lecture are the building blocks towards building
deep neural networks. So what we will learn about today are convolutional neural networks.
This is one of the most important building blocks of deep networks. So far we had those
fully connected layers where each input is connected to each node. This is very powerful
because it can represent any kind of linear relationship between the inputs. Especially
between every layer we have one matrix multiplication. This essentially means that from one layer
to another layer we can have an entire change of representation. It also means that we have
a lot of connections. So let's think about images, videos, sounds and machine learning.
Again this is a bit of a disadvantage because they typically have huge input sizes. You
want to think about how to deal with those large input sizes. Let's say we assume we
have an image with 512 x 512 pixels. That means that one hidden layer with 8 neurons
has already 512 to the power of 2 plus 1 for the bias times 8 trainable weights. That's
more than 2 million trainable weights. Just for a single hidden layer. Of course this
is not the way to go and size is really a problem. But there's more to that. So let's
say we want to classify between a cat and a dog. If you look at those two images here
you can see that a large part of these images they just contain empty areas. So they are
not very relevant. Pixels in general are very bad features. They are highly correlated,
scale dependent and have intensity variation. So there is a huge problem and pixels are
a bad representation from a machine learning point of view. You want to create something
that is more abstract and summarizing the information better. So the question is can
we find a better representation? We have a certain degree of locality of course in an
image. So we can try to find the same macro features at different locations and then reuse
them. Ideally we want to construct something like a hierarchy of features where we have
edges and corners that then form eyes. Then we have eyes, nose and ears that form a face
and then face, body and legs will finally compose an animal. So composition matters
and if you can learn a better representation then you can also classify better. This is
really the key and what we often see in the convolutional neural networks is that you
find very simple descriptors on the early layers. Then in the intermediate layers you
will find more abstract representations. Here we find eyes, noses and so on. In the higher
layers you then really find receptors for example here faces. So we want to have a local
sensitivity but then we want to scale them over the entire network in order to also model
these layers of abstraction. We can do that by using convolutions in the neural network.
So here is generally the idea of these architectures. Instead of fully connecting everything with
everything they use a so called receptive field for every neuron that is like a filter
kernel. Then they compute the same weights over the entire image essentially a convolution
and produce different so called feature maps. Next the feature maps go to a pooling layer.
The pooling then tries to bring in the abstraction and the demagnification in the image. In the
following we then can do again convolution and pooling and go into the next stage. You
can do this until you have some abstract representation and the abstract representation is then fed
to a fully connected layer. This fully connected layer in the end maps to the final classes
which are then car, truck, van and the like. So this is the classification result. So we
need convolutional layers, activation functions and pooling to get the abstraction and to
reduce the dimensionality. In the last layers we find fully connected ones for classification.
So let's start with the convolutional layers. So the idea here is that we want to exploit
the spatial structure by only connecting pixels in a neighborhood. This can then be expressed
in a fully connected layer except if we want to express this in a matrix we could set every
entry in our matrix to zero except the connections that are in the receptive field of the local
filter kernel. So this would mean that we can neglect many connections over spatial
distances. Another trick is that you use filters of size 3 by 3, 5 by 5 and 7 by 7 and you
Presenters
Zugänglich über
Offener Zugang
Dauer
00:16:08 Min
Aufnahmedatum
2020-10-11
Hochgeladen am
2020-10-11 21:06:19
Sprache
en-US
Deep Learning - Activations, Convolutions, and Pooling Part 3
This video presents convolutional layers including the concepts of strided and dilated convolutions and how to compute their gradient.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning